Welcome to Data Science Computing!

Lecture 01: Introduction

Danilo Freire

Department of Quantitative Theory and Methods
Emory University

08 April, 2026

Welcome to QTM350 - Data Science Computing! 🥳 🎉

Lecture overview

Our agenda for today


  • Introduction
  • Motivation
  • Class logistics
  • Computer set up

Course materials

Course repository: https://github.com/danilofreire/qtm350

Course website: https://danilofreire.github.io/qtm350

This course is hosted on GitHub, which serves as our central hub for lecture materials, code examples, discussions, assignments, and final project guidelines. We will use Canvas for course administration, including submitting assignments, accessing grades, and receiving announcements. Please take some time to get to know both platforms, and reach out if you have any questions.

Note

Please remember to check the course repository regularly for updates and announcements.

Nice to meet you! 👋

Instructor

A bit about me

Visiting Assistant Professor in the QTM

MA from the Graduate Institute Geneva, PhD from King’s College London, Postdoc at Brown University, Senior Lecturer at the University of Lincoln, UK.

Research interests: computational social science, experimental methods, policy evaluation, political violence, organised crime.

What about you? (time permitting!)


Now it’s your turn! 👋


Please introduce yourself, tell us your name, your major, one thing you really like, and something we don’t know about your city or country! 😊

My teaching philosophy


  • I love teaching and aim to make learning fun
  • Classes where students participate are the best
  • Hands-on activities help you learn better
  • I am always available to help and answer questions. And I mean it
  • Your feedback helps me improve my teaching. Please let me know what is working and what is not

Teaching assistants


  • The teaching assistants for this course will be confirmed soon

  • They will be answering questions during our lectures and holding office hours (see Canvas or the course website for office hours information)

  • They will also be grading your assignments and quizzes (with my oversight)

  • We are all here to help you! So feel free to ask questions during class, office hours, or via email 😊

Office hours

What for and what not for


  • What office hours are meant for:
    • Applying tools in practice
    • Discussion of issues related to the assignments
    • Boosting your knowledge of data science
  • What these sessions are not meant for:
    • Solving the assignments for you
    • Taking care of developing your coding skills

Class etiquette

  • Coding can be tough and push you out of your comfort zone. If the course pace is too fast, let us know. I expect your commitment, but I do not want anyone to fail
  • You are all keen on data science, but your backgrounds vary. That is great! Some sessions might be more engaging than others. If you are bored, help others or explore new data science areas
  • Always be respectful to each other
  • Ask questions whenever you need to!

Motivation: What is data science? 👨‍💻 👩‍💻

An old classic

An old classic

An old classic

An old classic

An old classic

Rise of the digital information age

Social media data

New data formats

Survey data

Cheap computing power

As a consequence:

  • Abundance of data available for research and for governments to make better decisions
    • Opportunities for novel research questions
    • New methods to answer longstanding research questions
  • New technologies also have social implications and can raise important policy issues
    • Ethical concerns
    • Use of technology by malicious actors
    • Government use of technology to censor or monitor citizens

Course overview and logistics 📖 📚 💻

Course objectives


  • Use data science tools for project collaboration and version control
  • Apply advanced techniques for data storage, manipulation, and querying
  • Create clear data visualisations and write well-documented code
  • Use AI tools to help with programming tasks
  • Understand the basics of containerisation and parallel computing

Key focus areas

Why reliability, reproducibility, and robustness matter

  • This course centres around three key areas of the modern data science workflow: reliability, reproducibility, and robustness
  • Reliability:
    • Ensures consistency in results across multiple runs
    • Minimises errors in data processing and analysis
    • Supports accurate interpretation of findings
  • Reproducibility:
    • Allows others to verify and build upon your work
    • Enhances the credibility of research outcomes
    • Facilitates long-term preservation of scientific knowledge
  • Robustness:
    • Enables analyses to handle unexpected data variations
    • Improves the stability of results under different conditions
    • Supports the scalability of methods to larger datasets

Key tools

Key tools

Key tools

  • SQL and Pandas for robust data manipulation


Key tools

Key tools

  • Docker for consistent computational environments
  • Dask for scalable and parallel computing

Logistics

Course information

  • Syllabus: Available on our course repository and website. The course is designed to be self-contained. The syllabus includes links to slides and Jupyter Notebooks we will use in class, along with recommended readings, and problem sets. I will upload slides throughout the term as we progress.

  • Schedule: Lectures are on Mondays and Wednesdays from 2:30 to 3:45 pm

  • Office Hours: I’m available to meet you at any time. And I mean it. Please reach out a couple of days in advance and we can schedule a meeting

  • Materials:

Assignments

How you will be graded

  • Problem sets: Ten of them, due on Wednesdays at 11:59 pm (50%)
  • In-class quizzes: Five of them (30%)
  • Final project: Due on the last day of class (20%)
  • Late policy: 10% off per day late
  • Collaboration: You can discuss assignments with your classmates, but you must write your own code and submit your own work. AI is allowed.
  • Academic integrity: Please refer to the syllabus for the university’s policy on academic integrity

Set up 💻 🛠

Software

  • Git: Version control system. Download it here. Instructions for installation here. Feel free to configure it if you wish (instructions here), but we are going to talk about it in class.

  • GitHub: Online platform for hosting code repositories. You will use it a lot, and not only for this class. Create an account on GitHub and register for a student/educator discount.

  • There is a series of tutorials available on our course website on how to set up Git and GitHub: https://danilofreire.github.io/qtm350/tutorials/tutorials.html

OS extras

  • Windows: Install Bash for Windows. Another good tutorial can be found here. You might also want to install Chocolatey (they offer a free and open-source version)
  • Mac: Install Homebrew and Oh My Zsh
  • Linux: None (you should be good to go)

Other tools

  • We will have time to install other tools during the course. But if you want to get ahead, you can install the following:
  • VS Code: Code editor. Download it here
  • Anaconda: Python distribution. Download it here
  • Docker: Containerisation tool. Download it here
  • PostgreSQL: Database management system. Download it here

Next class

  • We will cover computational literacy, including binary and hexadecimal numbers, and character encoding systems like ASCII and Unicode
  • We will also discuss the early days of computing, focusing on Konrad Zuse’s work with digital computers and binary arithmetic
  • We will talk about the evolution of programming languages, from assembly to modern high-level languages like Python, and the differences between compiled and interpreted languages
  • There will be time for questions about installing the terminal. You do not need it for next week, but consider installing it soon, as it will be necessary in two weeks. Please create a GitHub educational account if you do not have one 😉

and that’s all for today! 😊

Questions?

Thank you very much for your attention and have a great day! 😊